Controllable data synthesis method for grammatical error correction

نویسندگان

چکیده

Due to the lack of parallel data in current grammatical error correction (GEC) task, models based on sequence framework cannot be adequately trained obtain higher performance. We propose two synthesis methods which can control rate and ratio types synthetic data. The first approach is corrupt each word monolingual corpus with a fixed probability, including replacement, insertion deletion. Another train generation further filtering decoding results models. experiments different show that 40% same improve model performance better. Finally, we synthesize about 100 million achieve comparable as state art, uses twice much use.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grammatical Error Correction

Grammatical error correction (GEC) is the task of automatically correcting grammatical errors in written text. Earlier attempts to grammatical error correction involve rule-based and classifier approaches which are limited to correcting only some particular type of errors in a sentence. As sentences may contain multiple errors of different types, a practical error correction system should be ab...

متن کامل

Grammatical Error Correction of English as Foreign Language Learners

This study aimed to discover the insight of error correction by implementing two correction systems on three Iranian university students. The three students were invited to write four in-class essays throughout the semester, in which their verb errors and individual-selected errors were corrected using the Code Correction System and the Individual Correction System. At the end of the study, the...

متن کامل

System Combination for Grammatical Error Correction

Different approaches to high-quality grammatical error correction have been proposed recently, many of which have their own strengths and weaknesses. Most of these approaches are based on classification or statistical machine translation (SMT). In this paper, we propose to combine the output from a classification-based system and an SMT-based system to improve the correction quality. We adopt t...

متن کامل

Generating artificial errors for grammatical error correction

This paper explores the generation of artificial errors for correcting grammatical mistakes made by learners of English as a second language. Artificial errors are injected into a set of error-free sentences in a probabilistic manner using statistics from a corpus. Unlike previous approaches, we use linguistic information to derive error generation probabilities and build corpora to correct sev...

متن کامل

Memory-based Grammatical Error Correction

We describe the ’TILB’ team entry for the CONLL-2013 Shared Task. Our system consists of five memory-based classifiers that generate correction suggestions for center positions in small text windows of two words to the left and to the right. Trained on the Google Web 1T corpus, the first two classifiers determine the presence of a determiner or a preposition between all words in a text. The sec...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Frontiers of Computer Science

سال: 2021

ISSN: ['1673-7350', '1673-7466']

DOI: https://doi.org/10.1007/s11704-020-0286-4